TensorFlow Basics
R Interfaces to TensorFlow
Deep Learning
Supporting Tools
Deployment
Learning More
TensorFlow Basics
R Interfaces to TensorFlow
Deep Learning
Supporting Tools
Deployment
Learning More
Supports distributed execution and very large datasets
Supports automatic differentiation
Robust foundation for many deep learning applications
TensorFlow models can be deployed with a low-latency C++ runtime
R has a lot to offer as an interface language for TensorFlow
# Greta theta = normal(0, 32, dim = 2) mu <- alpha + beta * Z X = normal(mu, sigma) p <- ilogit(theta[1] + theta[2] * X) distribution(y) = binomial(n, p)
# BUGS/JAGS
for(j in 1 : J) {
y[j] ~ dbin(p[j], n[j])
logit(p[j]) <- theta[1] + theta[2] * X[j]
X[j] ~ dnorm(mu[j], tau)
mu[j] <- alpha + beta * Z[j]
}
theta[1] ~ dnorm(0.0, 0.001)
theta[2] ~ dnorm(0.0, 0.001)
| Dimension | R object |
|---|---|
| 0D | 42 |
| 1D | c(42, 42, 42) |
| 2D | matrix(42, nrow = 2, ncol = 2) |
| 3D | array(42, dim = c(2,3,2)) |
| 4D | array(42, dim = c(2,3,2,3)) |
Vector data—2D tensors of shape (samples, features)
Timeseries or sequence data—3D tensors of shape (samples, timesteps, features)
Images—4D tensors of shape (samples, height, width, channels) 
Video—5D tensors of shape (samples, frames, height, width, channels)
head(data.matrix(iris), n = 10)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species [1,] 5.1 3.5 1.4 0.2 1 [2,] 4.9 3.0 1.4 0.2 1 [3,] 4.7 3.2 1.3 0.2 1 [4,] 4.6 3.1 1.5 0.2 1 [5,] 5.0 3.6 1.4 0.2 1 [6,] 5.4 3.9 1.7 0.4 1 [7,] 4.6 3.4 1.4 0.3 1 [8,] 5.0 3.4 1.5 0.2 1 [9,] 4.4 2.9 1.4 0.2 1 [10,] 4.9 3.1 1.5 0.1 1
High-level R interfaces for neural nets and traditional models
Low-level interface to allow enable new applications (e.g. Greta)
Tools to faciliate productive workflow / experiment management
Easy access to GPUs for training models
Breadth and depth of educational resources
High-level neural networks API capable of running on top of TensorFlow, CNTK, or Theano (and soon MXNet).
Allows for easy and fast prototyping (through user friendliness, modularity, and extensibility).
Supports both convolutional networks and recurrent networks, as well as combinations of the two.
Runs seamlessly on CPU and GPU.
model <- keras_model_sequential() %>%
layer_conv_2d(filters = 32, kernel_size = c(3,3), activation = 'relu',
input_shape = input_shape) %>%
layer_conv_2d(filters = 64, kernel_size = c(3,3), activation = 'relu') %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_dropout(rate = 0.25) %>%
layer_flatten() %>%
layer_dense(units = 128, activation = 'relu') %>%
layer_dropout(rate = 0.5) %>%
layer_dense(units = 10, activation = 'softmax')
library(keras) # Load MNIST images datasets (built in to Keras) c(c(x_train, y_train), c(x_test, y_test)) %<-% dataset_mnist() # Flatten images and transform RGB values into [0,1] range x_train <- array_reshape(x_train, c(nrow(x_train), 784)) x_test <- array_reshape(x_test, c(nrow(x_test), 784)) x_train <- x_train / 255 x_test <- x_test / 255 # Convert class vectors to binary class matrices y_train <- to_categorical(y_train, 10) y_test <- to_categorical(y_test, 10)
model <- keras_model_sequential() %>%
layer_dense(units = 256, activation = 'relu', input_shape = c(784)) %>%
layer_dropout(rate = 0.4) %>%
layer_dense(units = 128, activation = 'relu') %>%
layer_dropout(rate = 0.3) %>%
layer_dense(units = 10, activation = 'softmax')
model %>% compile(
loss = 'categorical_crossentropy',
optimizer = optimizer_rmsprop(),
metrics = c('accuracy')
)
summary(model)
_____________________________________________________________________________________ Layer (type) Output Shape Param # ===================================================================================== dense_1 (Dense) (None, 256) 200960 _____________________________________________________________________________________ dropout_1 (Dropout) (None, 256) 0 _____________________________________________________________________________________ dense_2 (Dense) (None, 128) 32896 _____________________________________________________________________________________ dropout_2 (Dropout) (None, 128) 0 _____________________________________________________________________________________ dense_3 (Dense) (None, 10) 1290 ===================================================================================== Total params: 235,146 Trainable params: 235,146 Non-trainable params: 0 _____________________________________________________________________________________
history <- model %>% fit( x_train, y_train, batch_size = 128, epochs = 30, validation_split = 0.2 )
history
Trained on 48,000 samples, validated on 12,000 samples (batch_size=128, epochs=30)
Final epoch (plot to see history):
acc: 0.9057
loss: 1.5
val_acc: 0.9317
val_loss: 1.088
plot(history)
model %>% evaluate(x_test, y_test)
$loss [1] 0.1078904 $acc [1] 0.9815
model %>% predict_classes(x_test[1:100,])
[1] 7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 [36] 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 [71] 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6 9
| Estimator | Description |
|---|---|
| linear_regressor() | Linear regressor model. |
| linear_classifier() | Linear classifier model. |
| dnn_regressor() | Dynamic nueral network regression. |
| dnn_classifier() | Dynamic nueral network classification. |
| dnn_linear_combined_regressor() | DNN Linear Combined Regression. |
| dnn_linear_combined_classifier() | DNN Linear Combined Classification. |
W <- tf$Variable(tf$random_uniform(shape(1L), -1.0, 1.0))
b <- tf$Variable(tf$zeros(shape(1L)))
y <- W * x_data + b
loss <- tf$reduce_mean((y - y_data) ^ 2)
optimizer <- tf$train$GradientDescentOptimizer(0.5)
train <- optimizer$minimize(loss)
sess = tf$Session()
sess$run(tf$global_variables_initializer())
for (step in 1:200) {
sess$run(train)
if (step %% 20 == 0)
cat(step, "-", sess$run(W), sess$run(b), "\n")
}
TODO: Deep learning section
Successful deep learning requires a huge amount of experimentation
This requires a systematic approach to conducting and tracking the results of experiements
The training_run() function is like the source() function, but it automatically tracks and records output and metadata for the execution of the script:
library(tfruns)
training_run("mnist_mlp.R")
ls_runs()
Data frame: 4 x 28
run_dir eval_loss eval_acc metric_loss metric_acc metric_val_loss metric_val_acc
1 runs/2017-12-09T21-01-11Z 0.1485 0.9562 0.2577 0.9240 0.1482 0.9545
2 runs/2017-12-09T21-00-11Z 0.1438 0.9573 0.2655 0.9208 0.1505 0.9559
3 runs/2017-12-09T19-59-44Z 0.1407 0.9580 0.2597 0.9241 0.1402 0.9578
4 runs/2017-12-09T19-56-48Z 0.1437 0.9555 0.2610 0.9227 0.1459 0.9551
ls_runs(eval_acc > 0.9570, order = eval_acc)
Data frame: 2 x 28
run_dir eval_acc eval_loss metric_loss metric_acc metric_val_loss metric_val_acc
1 runs/2017-12-09T19-59-44Z 0.9580 0.1407 0.2597 0.9241 0.1402 0.9578
2 runs/2017-12-09T21-00-11Z 0.9573 0.1438 0.2655 0.9208 0.1505 0.9559
# define flags and their defaults
FLAGS <- flags(
flag_integer("dense_units1", 128),
flag_numeric("dropout1", 0.4),
flag_integer("dense_units2", 128),
flag_numeric("dropout2", 0.3)
)
# use flag layer_dropout(rate = FLAGS$dropout1)
# train with flag
training_run("mnist_mlp.R", flags = list(dropout1 = 0.3))
# run various combinations of dropout1 and dropout2
runs <- tuning_run("mnist_mlp.R", flags = list(
dropout1 = c(0.2, 0.3, 0.4),
dropout2 = c(0.2, 0.3, 0.4)
))
# find the best evaluation accuracy runs[order(runs$eval_acc, decreasing = TRUE), ]
Data frame: 9 x 28
run_dir eval_loss eval_acc metric_loss metric_acc metric_val_loss metric_val_acc
9 runs/2018-01-26T13-21-03Z 0.1002 0.9817 0.0346 0.9900 0.1086 0.9794
6 runs/2018-01-26T13-23-26Z 0.1133 0.9799 0.0409 0.9880 0.1236 0.9778
5 runs/2018-01-26T13-24-11Z 0.1056 0.9796 0.0613 0.9826 0.1119 0.9777
4 runs/2018-01-26T13-24-57Z 0.1098 0.9788 0.0868 0.9770 0.1071 0.9771
2 runs/2018-01-26T13-26-28Z 0.1185 0.9783 0.0688 0.9819 0.1150 0.9783
3 runs/2018-01-26T13-25-43Z 0.1238 0.9782 0.0431 0.9883 0.1246 0.9779
8 runs/2018-01-26T13-21-53Z 0.1064 0.9781 0.0539 0.9843 0.1086 0.9795
7 runs/2018-01-26T13-22-40Z 0.1043 0.9778 0.0796 0.9772 0.1094 0.9777
1 runs/2018-01-26T13-27-14Z 0.1330 0.9769 0.0957 0.9744 0.1304 0.9751
Recommended reading
Keras for R cheatsheet
Gallery and examples
Subscribe to the TensorFlow for R blog!
TensorFlow for R: https://tensorflow.rstudio.com
Stay up to date at: https://tensorflow.rstudio.com/blog/
Questions?
# Modify model object in place (note that it is not assigned back to)
model %>% compile(
optimizer = 'rmsprop',
loss = 'binary_crossentropy',
metrics = c('accuracy')
)
Keras models are directed acyclic graphs of layers whose state is updated during training.
Keras layers can be shared by mutliple parts of a Keras model.